EACL 2009 Proceedings of the EACL 2009 Workshop on Language Technologies for African Languages

نویسندگان

  • Guy De Pauw
  • Gilles-Maurice de Schryver
  • Lori Levin
چکیده

We describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which includes data from nine Southern Bantu languages. Because of practical constraints, the amount of speech per language is relatively small compared to major corpora in world languages, and we report on our investigation of the stability of the ASR models derived from the corpus. We also report on phoneme distance measures across languages, and describe initial phone recognisers that were developed using this data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EACL 2009 Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference

ii Preface We are delighted to present you with this volume containing the papers accepted for presentation at the We want to acknowledge the help of the PASCAL 2 network of excellence. Thanks also to Damir´Cavar for giving an invited talk and to the programme committee for the reviewing and advising. We are indebted to the general chair of EACL 2009, Alex Lascarides, to the publication chairs,...

متن کامل

Proceedings of the EACL 2009 Workshop on GEMS : GEometrical Models of Natural Language Semantics Endorsed by the Association for Computational Linguistics

We propose an approach to corpus-based semantics, inspired by cognitive science, in which different semantic tasks are tackled using the same underlying repository of distributional information, collected once and for all from the source corpus. Task-specific semantic spaces are then built on demand from the repository. A straightforward implementation of our proposal achieves state-of-the-art ...

متن کامل

Language ID in the Context of Harvesting Language Data off the Web

As the arm of NLP technologies extends beyond a small core of languages, techniques for working with instances of language data across hundreds to thousands of languages may require revisiting and recalibrating the tried and true methods that are used. Of the NLP techniques that has been treated as “solved” is language identification (language ID) of written text. However, we argue that languag...

متن کامل

Revisiting Multi-Tape Automata for Semitic Morphological Analysis and Generation

Various methods have been devised to produce morphological analyzers and generators for Semitic languages, ranging from methods based on widely used finitestate technologies to very specific solutions designed for a specific language or problem. Since the earliest proposals of how to adopt the elsewhere successful finite-state methods to root-andpattern morphologies, the solution of encoding Se...

متن کامل

The Universität Karlsruhe Translation System for the EACL-WMT 2009

In this paper we describe the statistical machine translation system of the Universität Karlsruhe developed for the translation task of the Fourth Workshop on Statistical Machine Translation. The state-ofthe-art phrase-based SMT system is augmented with alternative word reordering and alignment mechanisms as well as optional phrase table modifications. We participate in the constrained conditio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009